Skip to content

hdWGCNA Analysis

Author: SeekGene
Time: 14 min
Words: 2.7k words
Updated: 2026-01-26
Reads: 0 times
SeekSoul™ Online

Preface

IMPORTANT

hdWGCNA (hierarchical dictionary Weighted Gene Co-expression Network Analysis) is an important tool in single-cell transcriptomics for constructing gene co-expression networks and identifying functional modules. By integrating cell type-specific gene expression patterns, it helps researchers understand the gene regulatory mechanisms underlying cell heterogeneity and identify functional gene modules specific to cell types or states.

In single-cell research, we not only focus on expression differences of individual genes but also want to understand the coordinated regulatory relationships between genes. By constructing gene co-expression networks and identifying functionally related gene modules, hdWGCNA provides important clues for analyzing cell functions and development.

Core Functions of hdWGCNA

  • Gene Co-expression Network Construction: Construct cell type-specific gene co-expression networks based on gene expression correlations
  • Functional Module Identification: Identify gene modules that are co-expressed in specific cell types
  • Module Characteristic Analysis: Calculate Module Eigengenes and evaluate their biological significance
  • Module Specificity Analysis: Identify gene modules that are specifically expressed in certain cell types

This document aims to provide a detailed technical guide to hdWGCNA for single-cell researchers, covering its basic principles, operation methods on the SeekSoul™ Online platform, result interpretation, practical cases, and common questions, helping you quickly master and apply this tool.


Theoretical Basis of hdWGCNA

Core Principles

The core idea of hdWGCNA is: By identifying co-expression patterns between genes, construct cell type-specific gene co-expression networks, and identify functionally related gene modules. This process can be summarized into the following main steps:

  1. Gene Selection: Select gene sets suitable for co-expression analysis based on gene expression characteristics
  2. Network Construction: Construct weighted gene co-expression networks based on gene expression correlations
  3. Module Identification: Identify gene modules in the network through hierarchical clustering and other methods
  4. Module Characteristic Analysis: Calculate Module Eigengenes for each module
  5. Biological Significance Evaluation: Evaluate the biological functions of modules through enrichment analysis and other methods

Key Algorithm Details

Weighted Gene Co-expression Network Construction

  • Principle: Calculate the connection strength between genes based on gene expression correlations to construct a weighted undirected graph
  • Method: Use the soft threshold method to convert gene expression correlations into connection strengths, with the formula aij=|cor(xi,xj)|β
  • Advantage: Compared to unweighted networks, weighted networks better preserve information about correlations between genes

Module Identification

  • Principle: Identify gene modules through dynamic tree cutting algorithm
  • Method:
    • Calculate the module adjacency matrix
    • Perform hierarchical clustering
    • Use dynamic tree cutting algorithm to identify clustering branches
    • Assign color identifiers to each module

Module Eigengenes

  • Definition: The first principal component of all gene expressions in a module
  • Significance: Represents the expression characteristics of the entire module
  • Application: Used for correlation analysis between modules and downstream biological analysis

Evaluation of Biological Significance of Modules

Functional Enrichment Analysis

  • Method: Perform GO/KEGG and other functional enrichment analyses on genes in each module
  • Application: Inferred potential biological functions of modules

Module and Cell Type Association Analysis

  • Method: Calculate the overlap between module characteristic genes and cell type marker genes
  • Application: Identify functional modules specific to cell types

SeekSoul™ Online Operation Guide

On SeekSoul™ Online, the hdWGCNA analysis process is designed to be intuitive and easy to use. You don't need to write code; you can complete the analysis through the parameter configuration interface.

Preparation Before Analysis

IMPORTANT

The success of hdWGCNA analysis largely depends on the quality of input data and the rationality of biological questions. Before starting the analysis, please make sure:

  1. Data has been preprocessed: Your single-cell data has undergone standard quality control, dimensionality reduction, clustering, and cell type annotation.
  2. Appropriate cell subpopulations have been selected: hdWGCNA analysis should be performed in biologically meaningful cell subpopulations, such as annotated cell types or functionally related cell clusters.
  3. Data scale is moderate: For datasets with more than tens of thousands of cells, it is recommended to enable Downsample to avoid memory不足.

Parameter Details

The following table details the main parameters and their descriptions of the hdWGCNA analysis module on SeekSoul™ Online.

Interface ParameterDescription
Task NameThe name of this analysis task, must start with an English letter, can contain English letters, numbers, underscores, and Chinese characters.
Group.byThe column name in meta, consistent with grouping factors in other processes, e.g., Cellannotation, required.
Cell TypeThe objects corresponding to the meta column based on the grouping factor, e.g., T, B..., required.
Specieshuman|other.
Filter byThe column name in meta, consistent with grouping factors in other processes, not required, mutually exclusive with Group.by.
FilterObjects corresponding to the meta column based on the filter factor, multiple selection allowed, not required.
Feature selection methodMethods for selecting genes for co-expression network analysis: variable|fraction|custom
- variable: Use highly variable genes stored in Seurat objects
- fraction: Use genes expressed in a certain proportion of cells in a group
- custom: Custom gene set, gene_list must be specified after selection
Percentage of GenesDisplayed when gene selection method is fraction, input box, number 0-1 (two decimal places), default filled 0.2.
Custom Gene SetsDisplayed when gene selection method is custom.
Scoring MethodSeurat|UCell. Default filled UCell.
Reductionpca|harmony, drop-down box single selection, default pca.
K ClustersInput box, integer from 25-75, default filled 25.
DownsampleDefault filled False, drop-down box single selection True or False.
Downsample_numDefault filled 1000, number, manually edited and filled (displayed when True).
NoteCustom remark information.

Important Notes

CAUTION

  • Big Dataset Processing: When the total number of cells exceeds tens of thousands, if the Downsample parameter is set to False, the analysis may fail due to insufficient memory. It is strongly recommended to enable Downsample for analysis.
  • Metadata Specification: Please ensure that the metadata column names and content in the RDS file do not contain Chinese characters or special characters (such as &), otherwise it may cause process errors.
  • Species Matching: Ensure the selected species matches the actual data, otherwise it will affect the accuracy of the enrichment analysis database.

Operation Process

  1. Enter Analysis Module: Navigate to the "Advanced Analysis" module on SeekSoul™ Online and select "hdWGCNA".
  2. Create New Task: Name your analysis task and select the sample or project to be analyzed.
  3. Configure Parameters: Select the cell type, grouping information, etc. according to the above guidelines.
  4. Submit Task: After confirming the parameters are correct, click the "Submit" button and wait for the analysis to complete.
  5. Download and View: After the analysis is completed, download and view the generated analysis report and result files in the task list.

Result Interpretation

The hdWGCNA analysis report contains rich charts and data files. The following is a detailed interpretation of the core results.

Result File List

File NameContent Description
*_KME_modules.csvKME (Module Membership) values of genes in the module, reflecting the correlation between genes and the module.
*_cluster_allDMEs.csvDifferential analysis results of module characteristic genes between different cell types.
*_cluster_findallmarkers.csvCell type marker genes identified by FindAllMarkers.
*_hdWGCNA_findallmarkers_overlap_result.csvOverlap results between hdWGCNA module genes and FindAllMarkers marker genes.
*_modules_GOenrich_result.txtGO functional enrichment analysis results of genes in each module.
modules_count.txtNumber of identified modules.

Module Identification and Characteristic Analysis

Soft Threshold Selection Plot

  • Chart Interpretation: Shows network connectivity, average connectivity, and fitting index under different soft thresholds.
  • Selection Principle: Select a soft threshold with high fitting index and moderate average connectivity.

Module Dendrogram

  • Chart Interpretation: Shows gene hierarchical clustering results, with different colors representing different modules.
  • Grey Module: A collection of genes not assigned to any module.

Module Correlation Heatmap

  • Chart Interpretation: Shows the correlation between characteristic genes of different modules.
  • Color Meaning: Purple indicates positive correlation, green indicates negative correlation.

Module Functional Analysis

Module KME Value Plot

  • Chart Interpretation: Shows the top 10 genes with the highest KME values in each module.
  • KME Value Meaning: The correlation between a gene and the module's characteristic gene; higher values indicate greater module membership.

Module Network Plot

  • Chart Interpretation: Shows the co-expression network structure of genes in the module.
  • Node Size: Represents the KME value of the gene.
  • Edge Thickness: Represents the connection strength between genes.

Module Activity Visualization

UMAP Plot

  • Chart Interpretation: Shows the spatial distribution of module activity on the UMAP plot.
  • Color Meaning: Darker colors indicate higher activity, lighter colors indicate lower activity.

Violin Plot

  • Chart Interpretation: Shows the distribution of module activity in different cell types.
  • Application: Identify functional modules specific to cell types.

Dot Plot

  • Chart Interpretation: Shows the average level and expression proportion of module activity in different cell types.
  • Color Meaning: Blue indicates high activity, light colors indicate low activity.
  • Dot Size: Represents the proportion of cells expressing the gene in the cell type.

Module Differential Analysis

Differential Heatmap

  • Chart Interpretation: Shows the differences in module activity between different cell types.
  • Color Meaning: Red indicates upregulation, green indicates downregulation.
  • Asterisk Annotation: Indicates the level of significance.

Volcano Plot

  • Chart Interpretation: Shows the results of differential analysis of module activity in different cell types.
  • X-axis: log2 fold change
  • Y-axis: -log10(adjusted p-value)

Functional Enrichment Analysis

GO Enrichment Dot Plot

  • Chart Interpretation: Shows the GO functional enrichment results of module genes.
  • X-axis: Enrichment score
  • Y-axis: GO terms
  • Color: Indicates enrichment significance
  • Dot Size: Represents the number of genes enriched in the GO term

Application Cases

  • Literature: Morabito S, Miyamoto A, Pochareddy S, et al. bioRxiv. 2022.
  • Background: Researchers wanted to identify gene modules related to immune escape and suppression in tumor tissues, focusing on tumor-infiltrating T cells and macrophage subsets.
  • Analysis Strategy: Run hdWGCNA separately on annotated T cell subsets and macrophage subsets, select the variable gene set and use UCell for module activity scoring; then perform KME screening on modules and conduct GO/KEGG enrichment analysis, and finally perform differential analysis of module activity between tumor and control samples.
  • Key Findings:
    1. Identified an "immune suppression module" specifically upregulated in tumor-infiltrating macrophages, which contains PD-L1(CD274), IDO1, and several metabolism-related genes.
    2. This module was significantly upregulated in advanced samples (differential analysis FDR < 0.05) and highly overlapped with tumor-associated macrophage marker genes (KME>0.7).
    3. GO/KEGG results suggested that this module is enriched in pathways such as "immune suppression" and "tryptophan metabolism", suggesting potential immune regulatory mechanisms that can be used as therapeutic targets.

Case 2: Identification of Specific Functional Modules During Development

  • Literature: Langfelder P, Horvath S. BMC Bioinformatics. 2008.
  • Background: Study functional reprogramming of cell types between different time points during embryonic development, hoping to identify early development-specific modules.
  • Analysis Strategy: Group cells by time point, use the fraction method to select genes expressed in at least 20% of cells, construct a weighted co-expression network and use dynamic tree cutting to identify modules; perform time-series Module Eigengene analysis and enrichment analysis on the modules.
  • Key Findings:
    1. Found a module highly expressed in early development, containing various stemness maintenance-related genes (such as SOX2, NANOG-related pathway genes), whose Module Eigengene decreased over time.
    2. GO enrichment of this module showed entries related to "stem cell maintenance" and "cell cycle regulation", supporting its functional role in early development.
    3. By overlapping with single-cell annotation results, it was proven that this module is enriched in progenitor/stem cell subsets.
  • Literature: Ricchiuti V, Li J, Shi J, et al. J Transl Med. 2021.
  • Background: Compare treated and control cells to find drug response-related gene modules, and verify key genes through in vitro experiments.
  • Analysis Strategy: Merge Seurat objects of the treated and control groups, run hdWGCNA and calculate Module Score for each cell; perform differential analysis of module activity between the treated and control groups, and select genes with high KME values for subsequent experimental verification.
  • Key Findings and Verification:
    1. Identified a significantly upregulated "stress response module" containing several drug-metabolizing enzymes and stress-related transcription factors.
    2. The top 3 genes with the highest KME values in the module were selected for qPCR and Western blot verification, and the experimental results were consistent with the changes in module activity, confirming the biological reliability of the hdWGCNA results.

Notes and Best Practices

TIP

Avoid overinterpretation: hdWGCNA results are computational inferences based on transcriptomic data and do not equal real regulatory relationships. Any key findings need to be confirmed by subsequent biological experiments.


Frequently Asked Questions (FAQ)

Q1: How long does hdWGCNA analysis take?

A: The analysis time depends on the data scale and computing resource configuration. Generally speaking:

  • Small datasets (1,000-5,000 cells): 1-2 hours
  • Medium datasets (5,000-20,000 cells): 2-6 hours
  • Large datasets (>20,000 cells): 6-24 hours or longer It is recommended to enable Downsample to speed up the analysis.

Q2: What is the significance of KME values and Module Scores?

A:

  • KME (Module Membership): Module membership, representing the correlation between a gene and the module's characteristic gene. A higher KME value indicates a gene belongs more to the module.
  • Module Scores: Module scores, representing the activity level of a module in a single cell. Calculated using UCell or Seurat methods.

Q3: How to determine the biological significance of a module?

A: The biological significance of a module can be determined through the following methods:

  1. Functional enrichment analysis: Understand the potential functions of the module through GO/KEGG and other enrichment analyses
  2. Marker gene overlap: Overlap analysis with known cell type marker genes
  3. Differential analysis: Analyze the activity differences of the module between different cell types

Q4: How to verify the reliability of hdWGCNA analysis results?

A: The reliability of results can be verified through the following ways:

  1. Biological verification: Verify key module functions by combining known literature and databases
  2. Experimental verification: Verify key module genes through reporter gene experiments and other methods
  3. Cross-validation: Verify result consistency using different datasets or analysis methods

References

  1. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics. 2008 Dec 29;9:559. doi: 10.1186/1471-2105-9-559. PMID: 19116021; PMCID: PMC2631488.

  2. Zhang B, Horvath S. A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol. 2005 Aug 12;4:Article17. doi: 10.2202/1544-6115.1128. Epub 2005 Aug 12. PMID: 16646834.

  3. Morabito S, Miyamoto A, Pochareddy S, et al. Single-cell co-expression analysis identifies distinct regulatory programs in the developing human cortex. bioRxiv. 2022 Jan 1;2022.01.01.474662. doi: 10.1101/2022.01.01.474662.

  4. Miller JA, Cai C, Langfelder P, et al. Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinformatics. 2011 Jul 13;12:302. doi: 10.1186/1471-2105-12-302. PMID: 21752256; PMCID: PMC3152543.

  5. Ricchiuti V, Li J, Shi J, et al. A practical guide to single-cell RNA sequencing for biomedical research and clinical applications. J Transl Med. 2021 Sep 14;19(1):394. doi: 10.1186/s12967-021-02974-0. PMID: 34521462; PMCID: PMC8438992.

0 comments·0 replies